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Abstract 

We develop a new method for frequentist multiple testing with Bayesian prior information. 

Our procedure finds a new set of optimal p-value weights called the Bayes weights. Prior 
information is relevant to many multiple testing problems. Existing methods assume fixed, 
known effect sizes available from previous studies. However, the case of uncertain information is 
usually the norm. For a Gaussian prior on effect sizes, we show that finding the optimal weights 
is a non-convex problem. Despite the non-convexity, we give an efficient algorithm that solves 
this problem nearly exactly. We show that our method can discover new loci in genome-wide 
association studies. On several data sets it compares favorably to other methods. Open source 
code is available. 
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1 Introduction 

We are motivated by the genetics of human longevity. Genome-wide association studies of longevity 
compare long-lived individuals (e.g. centenarians who live to 100 or older) to matched controls 
(Brooks-Wilson, 2013). More than 500,000 genetic variants are tested for association to longevity. 
This is a large multiple testing problem. In addition to the multiplicity, the sample size is low - 
usually a few hundred. As a consequence, only a few loci have been replicably associated to human 
longevity. They do not explain the heritability of the trait (Hjelmborg et ah, 2006). 

The multiplicity may be countered by testing only a few candidate variants selected based on 
prior scientific knowledge. In a separate work in preparation, led by Dr. Kristen Fortney, we 
find that a more general genome-wide test helps improve power in longevity. We leverage prior 
information from genome-wide association studies of age-related diseases, such as coronary artery 
disease and diabetes. For this task, we develop a new method of frequentist multiple testing with 
Bayesian prior information. In this paper we provide the theory for this method. 

Our method is a type of p-value weighting scheme. P-value weighting is a general methodology 
for multiple testing that leverages independent prior information to improve power (see Roeder and 
Wasserman (2009); Gui et al. (2012) for a review). Suppose we test hypotheses Hi i — 1,2,... J, 
via the p-values Pi. For a significance level q G [0,1], the weighted Bonferroni method declares the 
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i-th hypothesis significant if Pi < qwi. The weights Wi > 0 are based on independent data. For 
weights averaging to 1, the family-wise error rate (the probability of making at least one error) is 
controlled strongly at cr = Jg. 

Previous work has found the optimal weights in a Gaussian model of hypothesis testing. Let 
the test statistics in the current study be Ti ^ 1), where jii are the means^ or effect sizes^ 

and test the null hypotheses fii > 0 against fii < 0. We have some information about fii from 
prior studies. The works Wasserman and Roeder (2006); Roeder and Wasserman (2009); Rubin 
et al. (2006) consider the model where fii is known exactly from the prior data, and the weights 
are allowed to depend on /i^. In this model they find the optimal weights maximizing the expected 
number of discoveries. We show this amounts to solving a convex optimization problem, but this 
was not used originally. 

The assumption that jii are known precisely is problematic: if they were known, there should be 
no follow-up study. Instead, we account for uncertainty by considering the model jii Af{r]i,af). 
Here only the prior mean rji and standard error ai are known from independent data, not the 
precise value of the effect sizes. However, finding the optimal weights, which we call Bayes weights, 
becomes a non-convex optimization problem. Westfall et al. (1998) use a direct numerical solver, 
and optimize for at most 4 tests. 

We give an efficient method to find the optimal weights for a large number of tests. We solve 
the optimization problem exactly for small q. For larger g, we can solve it for a nearby g* such that 
1^* —^1 < 1/(2-^)* The cost in the first case is 0(J), the cost in the second case is 0(Jlog J). These 
are the costs per iteration of our optimization algorithm. We observe a near constant number of 
iterations used. The solution of non-convex optimization problems is challenging in general, thus 
it is perhaps remarkable that this problem admits a nearly exact solution. 

This enables a new methodology for large-scale multiple testing that controls a frequentist error 
measure, while also taking into account Bayesian prior information. This method follows George 
Box’s advice to be Bayesian when predicting but frequentist when testing (Box, 1980). While this 
methodology was considered previously for a few tests (Westfall et ah, 1998), we are the first to do 
it on a large scale. 

When prior information is uncertain, we show in simulations that the new scheme does better 
than its competitors. We also show theoretically, in a sparse mixture model, that weighting leads to 
substantially improved power. We apply our method to genome-wide association studies (GWAS). 
By analyzing several GWAS data sets we show its advantages compared to other methods. 

This method should be useful for other problems in biology and elsewhere. Open source code is 
available from the authors. All our computational results are reproducible (see the Supplement). 

The contents of the paper are as follows: We discuss related work in Section 2. We develop 
the theory in Section 3 and present simulations comparing our method to alternatives in Section 
4. We apply our method to genome-wide association studies in Section 5 and use it to analyze 
GWAS data in Section 6. The supplementary material contains a description of available software 
and instructions to reproduce our computational results (Section 8), as well as mathematical proofs 
(Section 9). 

2 Related Work 

There is a large literature on related statistical methods for multiple testing with prior information. 
In early work, Spjotvoll (1972) devised optimal multiple testing procedures treating tests unequally. 
Later it was recognized that Spjotvoll’s results are equivalent to optimal p-value weighting methods. 
For instance, Benjamini and Hochberg (1997) developed extensions of Spjotvoll’s methods for p- 
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value weighting. 

Leveraging Spjotvoll’s results, Wasserman and Roeder (2006); Roeder and Wasserman (2009); 
Rubin et al. (2006) found the explicit formula for optimal weights in the Gaussian model 1), 

assuming the effects are known exactly. This lead to an efficient method suitable for large appli¬ 
cations. In the bioinformatics community Eskin (2008); Darnell et al. (2012) applied Wasserman 
and Roeder (2006) ’s framework to GWAS. They accounted for correlations between the tests but 
assumed the effects are known exactly. 

The simple method of testing the top candidates from a prior study is also popular. This is 
often known as two-stage testing or as a candidate study. A specific version for GWAS has been 
called the proxy-phenotype method (Rietveld et ah, 2014). Using only the top candidates runs the 
risk of discarding many potentially useful hypotheses. 

A missing ingredient is taking uncertainty into account. Westfall et al. (1998) considered a 
Gaussian model 1) for hypothesis testing where prior distributions are known for the means. 

However, their optimization methods could handle only a small number (J = 4) of tests. 

3 Theoretical results 

3.1 Background 

In this section we present our theoretical results. As background, we begin with the case of known 
means. We work in the Gaussian means model of hypothesis testing: We observe test statistics 
Ti ^ 1) and test each null hypothesis Hi : jii > {) against jii < 0. The p-value for testing Hi 

is Pi = 4>(r^), where $ is the normal cumulative distribution function. 

For a weight vector w G [0,00)^^ and significance level q G [0,1], the weighted Bonferroni 
procedure rejects Hi if Pi < qwi. Usual Bonferroni corresponds to Wi = 1. Then the expected 
number of false rejections, known as the per-family error rate, equals: ^ = 

q '^ 2 - Therefore, if ^ fhe expected number of false rejections is controlled strongly, 

under any configuration of truth or falsehood of at a Jq. By Markov’s inequality this implies 
that the family-wise error rate, the probability of any false rejection, is also controlled at a. This 
does not need independence of the Ti. We always assume that q < and usually g <C 1. Without 
loss of generality we restrict the weights to [0, l/q]> 

Let us denote the number of rejections by R{w) = ^ where /(•) is the indicator 

function. The optimal weights in this model were found explicitly by Wasserman and Roeder 
(2006); Roeder and Wasserman (2009) and independently by Rubin et al. (2006). Denoting by 
Et{’) expectation with respect to Ti^ they solve the constrained optimization problem 

J 

max Et{R{w)} subject to Wi = J. (1) 

we[0,l/qy 

It was not noted in these works that this problem is convex. Usually convex programs are about 
minimization of convex functions. This is equivalent to maximizing concave functions. In our case 
the objective is a sum of terms of the form ^{^~^{qwi) — /i^}, whose concavity follows directly by 
differentiation. Yet, by simple Langrangian optimization, the above papers show that if all fii < 0, 
the optimal weights are Wi = u;(/i^), where 


=■* (f+ 


( 2 ) 
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Here c is the unique normalizing constant such that the weights sum to J. Interestingly, the 
weights are not monotonic as a function of /i, but maximal for intermediate values of ji (Roeder 
and Wasserman, 2009). 

As noted by Roeder and Wasserman (2009) the formula is a direct consequence of Spjotvoll’s 
theory of optimality in multiple testing (Spjotvoll, 1972). Accordingly, we call these the Spjotvoll 
weights. 


3.2 Weighting leads to a substantial power gain 

To illustrate theoretically that weighting can lead to increased power, we compare the power of 
optimal weighting and unweighted testing in a sparse mixture model. 

P-value weighting exploits the heterogeneity of the tests. In the simplest case there are only 
large and small effects, say M <C m 0. We want the m ^ 0 limit, and for simplicity we suppose 
m = 0. Similar results hold if m ^ 0. Let the fraction of large and small means be 7ri,7ro > 0, so 
that TTi J means are M, and the remaining ttqJ are 0. We solve below for the optimal weights. 

Proposition 3.1 (Optimal weights for sparse means). There is a set of optimal weights that gives 
the same weight to equal means, wq and wi to 0 and M. These are: 

, , f (0, I/tti) if 7 ri$(-|M|/ 2 ) > g, 

Further, the power p* of the optimal p-value weighting method is: 

P {^1, ,(1) \ g + 7ri{$(|M|/2)-$(-|M|/2)} if 7ri^-\M\/2)<q. 

If \M\ is small enough that 7 ri$(—|M|/2) > q, all the weight is placed on the larger means. 
This is the behavior we expect intuitively. However, if |M| is large enough that 7 ri$( —|M|/2) < q, 
then it is advantageous to place some weight on the small means. The reason is that such a large 
\M\ will be detected with very high probability. 

The power of unweighted Bonferroni is Punif(^i? ^5 q) — tvqQ + 7 ri${$“^(g) + \M\}. 

In Figure 1 (a), we plot the ratio of p*( 7 ri, M, g)/punif( 7 ri, M, g) for q = 10“^ for a range of M 
and TTi. We observe that for most effect sizes M G [—2.5,—0.25], and tti < 0.4, we get a power 
boost of at least 50%. Further, there is a hotspot where the power gain can be 3-4 fold. Thus, 
optimal weighting can lead to a significant boost in power. 


3.3 Weights with imperfect prior knowledge 

In the previous sections it was assumed that the effects jif were known precisely. We will instead 
assume that we have uncertain prior information /i^ ^ about them. 

We maximize the expected power Eij,[Et{R{w)}]. The expectation is with respect to the random 
Ti and fif. Introducing 7 ^ = (cr? + 1)^^^, the Bayes weights problem becomes: 


max 

we[0,l/qy 


E* 


$ ^{qwi 


Vi 


li 


subject to Wi = J. 


i=l 


(3) 


This objective is not concave if any 7 ^ > 1. To help with visualization, the function w 
^[{^~^{qw) — p}/ 7 ] is plotted in Figure 1 (b) for four parameter pairs The function is 

increasing, and its curvature has two intervals: concavity, followed by convexity. 
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(a) Power gain. 



Figure 1 : (a) Contour plot of power ratio of optimal and unweighted testing for sparse means, (b) 
Plots of four different instances of the function that is summed in the optimization objective. The 
non-convex function w —rj}/^] with q = 0.05 is plotted for the following pairs ( 77 , a): 

(—0 • 1 , 1 ) solid, (—0 • 1 , 2 ) dashed, ( — 1 , 1 ) dotted, (— 1 , 2 ) dot-dashed. 


Our key contribution is to solve this problem efficiently for large J. Our results here are twofold. 
First, we can solve the problem exactly in the special case when q is sufficiently small. Second, we 
have a nearly exact solution for arbitrary q. We start with the simpler first case. Let us define 


c{r],rA) = - 


V + + 2 (7^ - 1) log(7A)}^/^ 

7 ^ — 1 


(4) 


It turns out that c is the optimal critical value when 


(5) 

i=l 

In our data analysis examples and simulations, this upper bound requires that q be below values 
in the range 0 • 1 — 0 • 3. In the next result we find the exact optimal weights for small q when all 

ai > 0 . 

Theorem 3.2 (Form of Bayes weights). If the significance level q G [0,1] is small enough that (5) 
holds then the optimal Bayes weights maximizing the average power (3) are Wi = w{r]i^ji]X) — 

^{c{r]i^ji;X)}/q, where X>1 is the unique constant such that '^(^O 7 b ^) = J- 

In the supplementary material, we solve this problem by maximizing the Lagrangian. The two 
key properties are the joint separability of the objective and constraint; and the analytic tractability 
of the Gaussian. 

Figure 2 shows surface and contour plots of an instance of the optimal weights w{r],a)^ as a 
function of the mean g and standard deviation a. In the theorem the weights are a function of ( 77 , 7 ). 
Here and below, we will often view them as a function of ( 77 ,( 7 ), via the natural map 7 ^ = (j^ + 1. 

As the standard error a becomes small, our weights tend to the Spjotvoll weights: 

Proposition 3.3 (Recovering the Spjotvoll weights). For any X and 77 < 0, the Bayes weight 
function defined by w{g^^]X) = ${ 0 ( 77 , 7 ; A)}/g tends to the Spjotvoll weight defined in ( 2 ) as 
( 7 ^ 0 . 
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Figure 2: Bayes weights: (a) surface and (b) contour plots of the Bayes weight function w{r]^a) 
defined in Theorem 3.2. Spjotvoll weights are on the segment a = 0, 77 < 0. 


With > 0, the weights are regularized: more extreme weights are shrunk towards a common 
value in a nonlinear way. For finite ai our weights can be viewed as a smooth interpolation between 
Spjotvoll and uniform weights. It is reasonable to think at first that as all ai 00 , the best 
weight allocation is uniform. This is not the case. As we will show below, an interesting symmetry 
breaking phenomenon occurs. 

Consider a weight vector w that equals 1/q for [Jq] indices, and assume Jq is not an integer. 
Distribute the remaining strictly positive weight equally among the remaining hypotheses. Now it 
is easy to see that for the 1/q weights we always reject, thus the power equals 1. For the remaining 
weights the objective ^[{^~^{qw) — rj}/^] tends to $(0) = 1/2 as a ^ oc (i.e. 7 ^ 00 ). This 
shows that the limiting power is [Jq] + (J — [J^J)/2 = (J + [JgJ)/2. This is larger than J/2, the 
limit power of uniform weighting! This illustrates the symmetry breaking phenomenon caused by 
the extreme non-convexity of the optimization problem. 

Fortunately the situation is better as long as condition (5) holds. This condition is easy to check 
for any given parameters q and ( 77 ^, a^), i = 1,..., J. In addition we will now show theoretically 
that the constraint is mild. Often, even if J is large, we want to keep a — Jq small, because a is 
the number of false rejections we tolerate. In this regime the condition holds as long as there are 
a few average-sized negative prior means rji (denote Zc = $“^(c)). 

Proposition 3.4 (Simple condition). Condition (5) is true (and so Theorem 3.2 applies) if there 
are K distinet indiees i with negative rg, sueh that 7^ log(7?)/|2;Q;/K| < \rii\ < 

If (a/A" ^ 0, it is well-known that asymptotically \Zf^ix\ {2\og{K / , SO the simple 

condition holds as long as: 7 /log( 7 /){ 2 log(i^/(a)}“^/^ < \rii\ < {2\og{K/. This is a quite 
weak requirement. For instance, if iF = 10, a = 0 • 01, then {2 log(iF/(a)}^/^ approximately equals 
3.7. Continuing this example, if cr = 1, so that 7 ^ = 2 , and 7 ^ log( 7 ^)/ 3.7 = 0.16, then we only 
need 10 hypotheses with 0.16 < I 77 I < 3.7. 

When q is small, we use Newton’s method to find the right constant A from the theorem via a 
one-dimensional line search. The function evaluations cost 0(J) per iteration, and empirically it 
takes a small number of iterations independent of J to converge. In our data analysis section, we 
solve problems with more than 2 million tests in a few seconds on a desktop computer. 
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Figure 3: Power of four p-value weighting methods as a function of their parameter. Unweighted 
(solid), Bayes (dashed) as a function of the dispersion 0, exponential (dotted) as a function of /3, 
and filtering (dot-dashed) as a function of \M\. The Spjotvoll weights correspond to the point at 
the origin 0 = 0 on the Bayes weights curve. 


Now we move to presenting our result for the general case. 

Theorem 3.5 (Weights in the general case). For any q G [0,1], the non-convex Bayes weights 
problem ean he solved for a nearby g* G [0,1], for whieh |g* — q\ < 1/(2J). The optimal weights 
and g* ean he found in 0(Jlog J) steps. 

This result is most relevant for the settings when a = Jq^ the expected number of errors under 
the null, is set to at least 1/2. If so, and especially for large J, our weights will be optimal for a 
g* that is close to q. We see from the proof that even for large g, g* often equals q. The method 
also returns the value g*, which the user can inspect. It is then up to the user to choose whether 
to perform multiple testing adjustment at the original level q or at the new level g*. 

We note that the analysis of non-convex optimization problems is challenging in general. It is 
perhaps remarkable that the non-convex Bayes weights problem admits a nearly exact solution. 

4 Simulation studies 

4.1 Bayes weights are more powerful than competing methods 

We present two simulation studies to explore the empirical performance of our method. First we 
show that Bayes weights increase power more reliably than competing methods. We compare three 
methods of p-value weighting: Bayes, exponential, and filtering. 

For Bayes weights we multiply the variances by a dispersion factor 0: J\f{r]i^ 0^f)- By changing 
the dispersion, we test the robustness of our method to mis-specification of the prior variances. 
This corresponds to the same dispersion variable (f in our GWAS application given in the next 
section. The dispersion ranges from 0 to 4. Spjotvoll weights correspond to 0 = 0. 

Exponential weights with tilt /3 are defined as: Wi — exp(/3|77^|)/c, where c = ^^P(/3|^d)* 

This weighting scheme was proposed in (Boeder et ah, 2006), who recommend /3 = 2 as a default. 
We include the range /3 G [0,4]. As noted by Boeder et al. (2006), exponential weights are sensitive 
to large means. To guard against this sensitivity, we truncate the weights larger than 1/q and 
re-distribute their excess weight among the next largest weights. 

Filtering methods test only the most significant effects rji < M, and give them equal weights. 
This and related methods are known under many names, such as two-stage testing, screening, or 
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the proxy phenotype method (Rietveld et ah, 2014). We adopt the name filtering from Bourgon 
et ah ( 2010 ), who filter based on independent information in the current data set instead of prior 
information. The threshold M ranges from —4 to 0. If |M| is large and fewer than Jq hypotheses 
would be tested, then we instead test the most significant Jq. 

The simulation is conducted as follows: We generate J — 1000 random means and variances 
independently according to rji ^ A/’(0,1), ~ |A/’(0,1)|. We set q — 10“^. For any weight vector 

re, we calculate the power as the objective from (3) divided by J, to reflect the average power per 
test. 

The results are shown in Figure 3. Each method can improve the power over unweighted testing. 
However, Bayes weights lead to more power than the other methods. The best power is attained 
when the dispersion 0=1, but good power is reached even when 0 is not 1. Our weights are robust 
to mis-specifying the dispersion. 

Note, in particular, that taking uncertainty into account helps. Spjotvoll weights, which assume 
fixed and known effects, and are shown on the figure as regularized weights with 0 = 0 , have less 
power than Bayes weights with positive 0, for a wide range of 0. 

The remaining two methods, filtering and exponential weights, have disadvantages. While 
filtering leads to power gain for a thresholding parameter M < 3/4, it also leads to a substantial 
power loss for M > 1. For sufficiently large M the power equals g, because only the top Jq 
hypotheses are selected. Further, it is a significant disadvantage that there is no principled way to 
choose M a priori without additional assumptions. 

Similarly, exponential weighting leads to at most a small gain in power, and usually leads to a 
loss. There also appears to be no simple, principled way to choose 0 a priori. 

We conclude that Bayes weights are quite insensitive to tuning and have uniformly good power. 
In contrast, exponential weighting and filtering are relatively sensitive and their power can drop 
substantially. Therefore, Bayes weights increase power more reliably than competing methods. 

4.2 Bayes weights have a worst-case advantage 

We show that Bayes weights have a worst-case advantage compared to Spjotvoll weights. We 
use the sparse means model: we generate J — 1000 means 77^, such that their distribution is 
77 ~ TTo^m + where m = —10“^ and M = —2. We set q = 10“^ and we vary tti from 0 to 0.1. 

We set all ai = a. 

We consider two values of a: 0 and 1. Spjotvoll is optimal for a = 0, while Bayes weights with 
a — 1 are optimal for 1. We evaluate these weighting schemes by calculating the objective that they 
do not maximize: the average power (3) for Spjotvoll and the deterministic power ( 1 ) for Bayes 
weighting. We also compute the power of the unweighted Bonferroni method. 

The results are in Figure 4. Bayes weights lose only a little compared to the optimal Spjotvoll 
((a) left). In contrast, Spjotvoll loses a lot compared to Bayes ((a) right). Bayes maximizes the 
worst-case power, thus showing a maximin property. 

Spjotvoll weights show a marked drop in power near tti = 0.07, as shown by its non-monotonic 
power curve in Figure 4 (a)). To understand this, we plot the two weighting schemes in Figure 4 
(b). Since there are only two classes, the weights also take two values. We see that the Spjotvoll 
weights are more extreme than the Bayes ones. They start putting weight equal to zero on the 
small means near tti = 0.07, which appears to lead to power loss. 
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(a) Power comparison for sparse means. (b) Weights of the two classes. 

Figure 4: (a) Deterministic (left) and average (right) power as a function of the proportion of large 
means tti. Three methods are compared: unweighted (solid), Spjotvoll (dashed), Bayes (dotted), 
(b) The weights of the two classes, large (solid) and small (dashed), for Spjotvoll (left) and Bayes 
(right), as a function of the proportion of large means tti. 

5 Application to Genome-Wide Association Studies 

5.1 Review of GWAS 

We adapt our framework to genome-wide association studies, relying on basic notions of quantitative 
genetics (see e.g. Lynch and Walsh, 1998). Our method applied to this problem is called iGWAS in 
our forthcoming application to human longevity. This section presents in detail the methodology 
for that application, while also illustrating the steps to use our framework for specific problems. 

Consider a model for GWAS in which we study a quantitative trait y in a population. Our goal 
is to understand the effects of single nucleotide polymorphisms (SNPs) ^ 1 ,^ 2 , • • • ^9j on the trait. 
We assume y has mean 0 and known variance, and gi denotes the centered minor allele count of 
variant i for an individual. We rely on the linear model for the effect of the i-th variant on the 
trait: y = 

We will show how our question can be framed in the Gaussian means model of hypothesis testing. 
In the above model y is the phenotype of a randomly sampled individual from the population. 
Accordingly, gi is random, Pi is a fixed unknown constant, and Si is the residual error. This error 
is a mean-zero random variable independent of with variance cr?. 

Suppose we observe a sample of N independent and identically distributed observations from 
this model. We use the standard linear regression estimate /3^, which for large sample size has an 
approximate distribution ^ cr|/var(^^)}. We can standardize if we divide by r^, 

where rf — cr|/var(^^) is the variance of 

With these steps, we have framed our problem in the Gaussian means model. Denoting Ti = 
fii — we have Ti ^ A/’(/ii,l), which is the required form. Denote also the 

standardized effect size Ui = /3^/t^, which will be of key importance. 
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5.2 Prior Information 

Now we show how to use prior information. Assume we also have a prior trait yo measured 
independently on a different, independent sample from the same population. If our assumptions 
also hold for yo^ we can write yo = gil3oi + Soi- Here f3oi is a fixed unknown constant, and eoi is 

random. Suppose we have independent samples of size Ni and Noi for the two traits. If we define 

1 /2 

Toi? ^Oi by analogy to their definitions for we can write Toi 

We model the relatedness between the two traits as a relation between the standardized effect 
sizes z/, which do not depend on the sample size. If the two traits are closely related, the first order 
approximation is equality: Ui = z/qz- This simple model captures the pleiotropy between the two 
traits (e.g. Solovieff et ah, 2013). 

The final step is to compute the distribution of z/^ given the prior data Tq^. For this we need to 
choose a prior for z/^, and for simplicity we will use a fiat prior. 

We now have all ingredients to apply the model for Gaussian hypothesis testing with uncertain 
information. Specifically, we have jii •V(? 7 j,crf), where m r\i = {NijNoiY/'^TQi, and 

af = Ni/Noi. 

The uncertainty in Toi may be different from 1, for instance larger than 1 due to overdispersion. 
In addition, overdispersion is one way to weaken the first order approximation assumption. To 
allow for this, we recall the dispersion parameter (j) used in our simulation. We model the prior 
data as Toi ^ A/’(A^q/^z/oz, 0). Then the variance = (l)Ni/Noi. The default is 0 = 1. Finally, we 
compute Bayes weights Wi with parameters g, {rn^a^f)^ and run weighted Bonferroni on the current 
p-values. This fully specifies the method. For the reader’s convenience, the method is summarized 
in Algorithm (1). 


Algorithm 1 Multiple Testing with Bayes weights in GWAS 
Toi ^ prior effect sizes for i = 1,..., J 
Noi^ Ni ^ prior and current sample sizes 
Pi ^ current p-values 
q ^ significance threshold 
(j) ^ dispersion (default 0=1) 

Set the prior means and variances: rji = {Ni/Noi)^^‘^Toi^ a‘f = 
Compute Bayes weights defined via (3), with parameters g, {r]i^a‘^) 
Return indices i such that Pi < qwi 


5.3 Practical remarks 

It is important that we retain type I error control as soon as we have valid p-values, even if the 
modelling assumptions fail. Common deviations from our model are: ( 1 ) GWAS summary data 
sometimes only has the magnitude of the effects, and not their sign. In this case we have two 
choices. We may assume that the directions of effects are the same, and do a one-tailed test of the 
current effect in the prior direction. Alternatively, we can do a two-tailed test by including the 2 
tests {r]i^af) and {—r]i^af) for each z, for a total of 2J tests. (2) When the prior and current trait 
are not both quantitative, and one is binary, the model z/^ = z/qz should be re-examined. It is still 
convenient as a first approximation. (3) Benjamini-Hochberg may be used instead of Bonferroni, 
with any weights summing to J, for increased power (Genovese et ah, 2006). 
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6 Data Analysis 


6.1 Data Sources 

We illustrate our method by analyzing 5 publicly available genome-wide association studies of 
quantitative or binary traits. We use the association p-values, which are available for 500,000 
to 2.5 million SNPs. The five studies are: CARDIoGRAM and C4D for coronary artery disease 
(Schunkert et ah, 2011; Coronary Artery Disease (C4D) Genetics Consortium, 2011), and one each 
for the kidney trait estimated glomerular filtration rate (eCFR) creatinine (Kottgen et ah, 2010), 
blood lipids (Teslovich et ah, 2010), and schizophrenia (Schizophrenia Psychiatric Genome-Wide 
Association Study (GWAS) Consortium, 2011). CARDIoGRAM and C4D include non-overlapping 
samples. A detailed description appears in the Supplementary material. 

6.2 Specific Pairs 

We analyze three pairs of data sets, with specific motivation for each. First, we use CARDIoGRAM 
as prior information for C4D. This is a ‘positive control’ for our method, since both studies measure 
the same phenotype, coronary artery disease. Therefore the weights should increase power. We 
choose C4D as target because it has smaller sample size; hence prior information may increase 
power more substantially. 

Second, we use the blood lipids GWAS as prior information for schizophrenia. Andreassen 
et al. (2013) showed improved power with this pair. They developed and controlled the Bayesian 
conditional false discovery rate. This is not known to control a frequentist criterion. Our goal was 
to evaluate the power improvement using a frequentist method. As Andreassen et al. (2013) noted, 
there is a small overlap between the controls of the two GWAS (Section 10). 

Third, we used the eCFR creatinine GWAS as prior information for the C4D coronary artery 
disease study. Heart disease and renal disease are comorbid (eg. Silverberg et ah, 2004), so it is 
possible that this may improve power. Here the hypothesized improvement is not based on entirely 
rigorous arguments. 

6.3 Methods compared 

We run weighted Bonferroni multiple testing for each of the 5 weighting schemes. The prior data 
is Toi = $“^(Poi/2), where P{)i is the i-th prior p-value. The family-wise error rate is controlled at 
q = 0.05. 

The first four methods are: unweighted Bonferroni, where all weights equal 1; Spjotvoll weights 
with parameters jii — (A^^/A'oz)^^^Foi; Bayes weights defined in Section 5, with dispersion (j) = 0.1,1, 
and 10; and exponential weighting (Boeder et ah, 2006) with tilt /3 = 1,2, and 4, introduced in 
Section 4.1. 

The fifth and last method is filtering, which selects the smallest p-values in the prior study, 
and tests their SNPs in the current study. We use three p-value thresholds P < 10“2,10“^,10“^. 
Rietveld et al. (2014) propose a method to choose the optimal p-value threshold for filtering. 
This needs the genotypic correlation between the two traits and the additive heritability of the 
current trait. For complex traits these parameters are usually estimated with a large uncertainty. 
Substantial domain expertise is required to choose the right parameter. 
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6.4 Additional details 

We prune the significant SNPs for linkage disequilibrium (LD) using the DistiLD database (Palleja 
et ah, 2012). We LD prune the significant SNPs for each method by selecting one SNP from each 
LD block. Our data analysis pipeline is available from the first author. 

We compute a score for each method m with parameters p, on each data set d. This is 

defined as +1 if the method increases the number of hits compared to unweighted, 0 if it leaves 
it unchanged, and -1 otherwise. The score of a method m with parameters p is the sum of 

scores across data sets. The total Sm of the method m is the sum of scores across parameters. 

6.5 Results 

Table 1: Number of significant SNPs of five methods on three examples. Top: Results pruned 
for linkage disequilibrium (LD). Middle: results without LD pruning. Bottom: The score of each 
method. The methods compared are unweighted (Un); Spjotvoll (Spjot); Bayes with 0 = 0 -1,1,10; 
exponential (Exp) with (3 = 1,2,4 and filtering (Filter) with — log(P) = 2,4,6. CG stands for 
CARDIOGRAM 



Un 

Spjot 


Bayes (0) 


Exp(/3) 

Filter ( — 

log P) 

Parameter 



0-1 

1 

10 

1 

2 

4 

2 

4 

6 

LD Pruned 

CG ^ C4D 

4 

11 

10 

8 

4 

4 

5 

4 

10 

10 

6 

Lipids ^ SCZ 

4 

1 

1 

1 

5 

1 

0 

0 

2 

2 

2 

eGFRcrea —>■ C4D 

4 

2 

2 

4 

4 

4 

5 

4 

1 

0 

1 

Unpruned 

CG^ C4D 

29 

45 

44 

39 

29 

32 

34 

27 

40 

48 

34 

Lipids ^ SCZ 

116 

214 

214 

223 

123 

92 

0 

0 

217 

96 

39 

eGFRcrea —>■ C4D 

29 

18 

18 

23 

29 

29 

28 

19 

1 

0 

1 

Scoring 

Score 

0 

0 

0 

1 

1 

0 

0 

-1 

0 

-1 

-1 

Total 

0 

0 


sum 

= 2 


sum 

= -1 

sum = 

= -2 


The results of our data analysis are presented in Table 1. This table has the number of significant 
SNPs on the pairs of GWAS data in 6.2 using the weights in 6.3. We also present the LD pruned 
results, which are a proxy for the the number of independent loci found. 

The results are somewhat inconclusive. On the positive control example, all methods except 
exponential weighting improve power. Spjotvoll weighting and filtering have the largest number of 
SNPs. On the blood lipids example, methods generally lose power for LD pruned SNPs (except 
Bayes weights with (j) = 10); and methods can both lose and gain power for non-LD pruned 
SNPs (except Bayes weights which uniformly improves power). On the other hand, for the eGFR 
creatinine example, exponential weights show the best behavior. We also see that the default 0=1 
is never worse that both unweighted and Spjotvoll, and for the unpruned lipids example it is better. 

If we allow for tuning of parameters, Bayes weights show a good performance. They are either 
first or second in all examples, and other methods rank lower. However, since we don’t have a 
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principled way to tune the parameters, we do not pursue this way of evaluation. 

Instead, we look at the scoring method for evaluation. The only method with a positive score is 
our Bayes weights (0=1 and 10). The total score, summed across parameter settings, is also only 
positive for Bayes testing. This shows some promise for our method. However, from this analysis 
alone we cannot establish conclusively the relative merits of the methods. In future work it will be 
necessary to evaluate p-value weighting methods on more data sets. 
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Supplementary material 

The supplementary material is organized as follows: Accessing the software implementation of our 
method is described in Section 8. In Section 9 we give the proofs of all mathematical claims from 
the paper. In addition, in Subsection 9.5.2 we give Algorithm 2 to compute our weights. In Section 
10 describe our data sources in detail. 

8 Software 

8.1 End-user software 

We provide R and MATLAB implementations of the methods developed in this paper. They can 
be obtained from public git repositories or directly from the first author. The Matlab implemen¬ 
tation is available from https://github.com/edgardobriban/pvalue_weighting_matlab. The 
R implementation is available from https://github.com/edgardobriban/pvalue_weighting_r. 
An R package is under development. 

8.2 Reproducibility 

All computational results and analyses of this paper have been performed in a reproducible way. 
To reproduce the simulation results and figures, we provide the source code in the above-mentioned 
Matlab package at https://github.com/edgardobriban/pvalue_weighting_matlab. To repro¬ 
duce the data analyses, a separate repository has been created, and is publically available at 
https://bitbucket.org/edgardobriban/pvalue-weighting-gwas. 

9 Proofs 

9.1 Proof of Proposition 1 - Sparse Means 

Proof. The objective function we need to maximize in Wi is 

J 

i=l 
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We know that out of the J means ttqJ are equal to 0. For each of these, the summand in 
the objective simplifies to qwi. For the remaining tti J means, the same convex objective g{w) = 
— fii} is summed, and the objective becomes: 


q ^ Wi+ g{wi). 

Now, since g is convex, we have where w is the mean of the weights 

of large effects w = Hence, there is a set of optimal weights that take equal 

values for equal means. This proves the first claim in the proposition. 

If we call wq^wi the weights for 0 and M, the optimization problem takes the form 

max ttqWq + 7ri$ (^~^(qwi) — M) s.t. ttqWq + ttiWi = 1. 

WQ,wie[0,l/q] 

Next, recall that qwi < 1 and introduce a new set of variables q := ^~^{qwi). If qwi = 1, then 
Ci will take the value +oo in the extended real number system M = M U {oo}. All our calculations 
respect the rules of the extended number system, so this will not cause any problems. For instance, 
$(oo) is defined as 1 by continuity. 

Then the optimization problem becomes 

max_ TTo^^ (co) + 7ri$ (ci — M) s.t. ttq^ (cq) + 7ri$ (ci) = q. 

co,ciGM 

Substituting the second equation we find that we need to maximize 

g + TTi ($ (Ci - M) - $ (Ci)) 

subject to 7ri$ (ci) < q. Now it is easy calculus to check that the function c ^ $ (c — M) — $ (c) is 
strictly increasing on (—oc, M/2] and strictly decreasing on [M/2, oc). Therefore, if M/2 is feasible, 
i.e. 7ri$ (M/2) < g, then the maximum is achieved at ci = M/2. This leads to the claimed formula 
for wi — ^{ci)/q and the power p*. 

If M/2 is not feasible, then the maximum is achieved at the largest feasible value ci, defined by 
7ri$ (ci) = q. This again leads to the claimed formulas. □ 


9.2 Proof of Theorem 1 - Optimal weights for small q 


Similarly to Section 9.1, we introduce the new set of variables q := $ ^{qwi). We can equivalently 
rewrite our problem as 

J / . _ A J 

max $ I — -^ I s.t. = qJ. 

Clearly it is enough to find a scalar dual variable A such that we can solve the Lagrangian 
problem 

max - A - gj) s.t. ^$(ci) = gJ. 

cGM \ J \i=l J i=l 

The strategy is to study this penalized objective for each fixed A, find maximizers q(A), and 
then find a suitable A to make the constraint hold. 

For this we introduce a function /(c;77,7) that is the generic term in the Lagrangian function 
of the problem: 
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/(c;r?, 7 ) = $(^) -A$(c). 

We find the maximizer of / in the following lemma. 

Lemma 9.1. If X>1, then the maximum of f is reaehed at 

ri + jJr]^ + 2 (72 - 1) log( 7 A) 

Cl 0, 7 ; A) =- - - 2 —^ 

7^ — 1 

This function was called c in the main paper. Here we call it ci to distinguish it from the 
dummy variable c. 

Proof. Denoting the standard normal density by (/p, the derivative of / with respect to c is: 

Thus df /dc> 0 a and only if 

~ > 21 og( 7 A). (7) 

By assumption 7 > 1 . We have a quadratic inequality for c, and the associated quadratic 
equation has two roots ci,C 2 . If A > 1 both roots are real: ci < C 2 . Indeed it is easy to check that 
the smaller root is given in ( 6 ). Then / is increasing on (—oo,ci] and [c 2 ,oc), and decreasing on 
[ci, C 2 ]. Further, the limit of / at —oc is clearly 0, while at oc it is 1 — A. Therefore, if A > 1, ci is 
a global maximum of /. □ 

Therefore, if A > 1, the i-th summand in the Lagrangian is maximized at 01 ( 77 ^, 7 ^, A). To solve 
our problem it is enough to find a A > 1 such that J2i=i ^(ci(^ 7 , 7 z, A)) = Jq. 

Such a A exists by condition (5), as follows: ci is a decreasing function of A, thus the constraint 

A > 1 is equivalent to the sum of $(ci) evaluated at A = 1 to be greater than Jq. This is exactly 

the condition we assumed in (5). This finishes the proof. 

9.3 Proof of Proposition 2 - Recovering the SpjotvoII weights 

Let ( 7 ^ — l) ( 77 ^ + 27 ^ 1 og( 7 A)). Then 01 ( 77 , 7 ; A) equals after some calculation: 

77 + y/ 77 ^ + _ R? 1 _ rf + 27 ^ log( 7 A) 

7 ^ — 1 y^77^ + — 77 7^ “ 1 y^77^ + — 77 

As a ^ 0, we have 7 ^ 1 , 7 ^ 1 og( 7 A) ^ log(A), and R^ 0. For 77 < 0 this shows that the 

limit of Cl is 

77 ^ + 2 log(A) _ 77 ^ + 2 log(A) 

I 77 I — 77 2rj 

Recall that the SpjotvoII weight is 

=4 (f+^) hi- 

This shows that the we recover the SpjotvoII weights in the limit as cr ^ 0. 
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9.4 Proof of Proposition 3 - Simple sufficient condition 

We start by showing that for A = 1 

7 ]“^ + 7^ log(7^) 


ci(??,7; A) > -- 


2|hl 


Indeed, for negative 7] < 0 the left hand side equals: 
r] + 'y^ 7 f + (72 - 1) log(72) 


+ 7 ^ log ( 7 ^) 


^ \v\ + ( 7 ^ - 1) log(7^) 

and now the inequality follows immediately, because 7 > 1 , so 

l\Jrf + ( 7 ^ - l)log( 72 ) > 71 /^ > |77|. 

Thus the condition (5) of the theorem is satisfied if the following more explicit inequality holds. 


>j,. 


1=1 


2 |hi 


( 8 ) 


For this to hold it is clearly sufficient that there are M distinct indices such that (recall a = Jq) 

Vi + ll log(7| 


$ - 


2|hi| 

The above inequality is equivalent to 

\Vi\ ^ 77og(7|) 


> a/M. 


— I /MI • 


(9) 


2 21 77 ^ I 

By assumption there are M indices such that \r]i\/2 < li ^ 

\^a/M\/‘^‘ For these indices (9) is true, so that ( 8 ) holds. This shows that the original condi¬ 
tion (5) holds, finishing the proof. 


9.5 Proof of Theorem 2 - Optimal weights in the general case 

The proof builds on Theorem 1, but requires more detailed analysis. There are two parts: under¬ 
standing the monotonicity of the generic term in the Lagrangian, then using it to find the optimal 
weights. 


9.5.1 Monotonicity of the generic term 

We saw in Lemma 9.1 that if A > 1, / is maximized at ci. We also saw in the proof that there are 
two cases: the roots of (7) are either real or complex. If the roots are complex, then / is increasing 
and the supremum is at -hoo. The two roots are complex when the discriminant of the quadratic 
equation is negative: 

rf + 2 ( 7 ^ - 1 ) log( 7 A) < 0. 

This inequality is equivalent to the following bound for A, in terms of a new function 

A<;(r?, 7 ) . ( 10 ) 
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Therefore if A < then the supremum of / is at c = +oc. Otherwise there are two 

candidates for the supremum: c = +oo and ci. We want to compare the two candidate maxima. 

Denote the limit of / at infinity by Poo(A) = Poo (A; p, 7) and the value of / at ci, when defined, as 
Pi (A) = Pi (A; 77,7) =/(ci). Let the difference between the two extrema be (i(A; p, 7) =Pi(A;p,7) — 
Poo (^5 P? 7)- In following lemma we find the maximizer of / as a function of A. 

Lemma 9 . 2 . There is a unique value k{r]^ 7) in the interval [/(p, 7), 1 ] sueh that d{k{r]^ 7); P? 7) = 0 . 
For A < k{r]^j), the supremum of f is at +00, else it is at ci. For the speeial value A = k{r]^j) 
both values are equal 

Proof. We have explicitly Poo(A) = 1 — A and thus poo decreases linearly from 1 to 0 on the unit 
interval [ 0 , 1 ]. We also have by definition 

p^(A) = $ A)). 

Differentiating this expression with respect to A reveals 

dpi 1 fci- 7 ]\dci ( dci \ 

However, by the definition of ci we have p((ci — p)/7) — A(p(ci) = 0 . Indeed, ci was defined 
as one of the extrema of /, which leads to the above equation by taking derivatives. Hence the 
expression above simplifies to: dpi/d\ = —$(ci). 

This shows that pi is decreasing in A: the derivative belongs to ( — 1 , 0 ). We also know that 
Pi > 0 , because the value pi is a local maximum of / in c, and we have seen that / - as a function 
of c - increases from 0 (a value which it takes in the limit at —00) to /(ci) - and hence pi = /(ci) 
must necessarily be positive. We conclude that pi(l) > 0 = Poo(l) 

Finally, we note that pi(A) is well-defined precisely when ci(A) is. This happens when the 
expression inside the square root is non-negative, which means A > /(p,7). 

To summarize 

• d is defined on the interval A G [/(p,7), 1 ]. 

• d is a strictly increasing differentiable function on this interval, because its derivative is 

If = 1 - $(ci) > 0. 

• d(l) > 0. 

If we show that d has a unique root A;, our conclusion will follow. Given the above three 
statements, it’s enough to show that d(/(p,7)) < 0; then the claim follows by the intermediate 
value theorem. 

To check the condition d(/(p,7)) < 0 we note that ci(/(p,7)) = —hence pi(/) < Poo (0 
equivalent to 


$ 






< 1 - 1 , 


or after some rearrangement (77) <^(77j- 

Introducing the variable x — and eliminating rj by noting I 

obtain that all we need to show is 


exp(—x^( 7 ^ — 1 )/ 2)/7 we 
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exp 


$(x) < exp 


2 


$(7x)7. 


If 77 < 0, then X < 0, and the above will follow if the function m{x) = xexp $(x) is strictly 

increasing on (—oc, 0]. Checking this is an elementary calculus exercise. The cases 77 = 0 and 77 > 0 
are handled similarly. This finishes the proof. 


□ 


9.5.2 Method for computing weights 

In the previous section, we saw that for any i, if A < ki := ^( 77 ^, 7 ^), then the supremum of /(•; rn^^i) 
occurs at +oc, else it occurs at 01 ( 77 ^, 7 ^; A). For the optimal weights this shows: 

{ ^(ci(?7i,7i;A))/Q if A >/i:(r 7 i, 7 j), 

^{c\{r]inrA))/(l or l/q if A = k{rii,'yi), 

Ijq if A </i:(r?i, 7 j). 

We emphasize that the value of Wi can take either of two values for A = k{r]i^ji). We will 
unambiguously specify a choice later for each i, determined by the constraint on the sum of the 
weights. 

Now all that remains is to search for a suitable A such that the weights sum to J: 

If we find such a A, then by duality the weights Wi will solve our original problem. Let us denote 
the sum of the weights by W : 

J 

WiX) = 

i=l 

The function W{X) is unambiguously defined for ki^ which is what we will use at first. 
Each function Wi is decreasing in A: constant on the interval (0,A;i), and decreasing smoothly 
from Ti — ^{ci{r]i^^i]ki))/q to 0 on the interval (ki^oo). Further, the function Wi has a jump of 
size (1 — ri)lq at ki. 

Therefore VF is a decreasing function of A on [0, oc), going from J/q to 0. Further, if we sort the 
values ki such that A;(o) = 0 < k(^i^ < • • • < fc(j) < ^(j+i) = cc, then W is smooth on the intervals 
and has jumps of size (1 — r(^i^)/q at for 1 < i < J. 

Hence, for our problem of solving IF(A) = J, there are two possibilities: 

Case 1 There is an interval (A:(i), such that J belongs to the image of this interval under 

W. In this case, since the function W is strictly decreasing and continuous on this interval, there 
will exist a unique A in the interval such that IF(A) = J. In this case we can solve the original 
problem exactly. 

Case 2 There is no such interval. In this case, the present duality approach is unable to produce 
an exact solution to the original problem. 

However, we can get an approximate solution. Let us consider the values W{ki). Above we 
noted that Wi{ki) can take two possible values l/q and Vi. Hence W can also take more than 
one value at ki. Since several ki may be equal, W may take more than two possible values, 
because each summand Wi can be chosen in two ways. To understand this, let us call the distinct 
values of ki to be iFj, and assume Ki < K 2 < ... < K^. For brevity let us also add Kq — 0, 
— oc. Without loss of generality, we assume that the values ki cluster in the following way: 
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= ^( 1 ) = ^( 2 ) = ... = ^(zi )5 K 2 = ^(n+ 1 ) = ... = ^( 22)5 so on. We also define the sets Si 
containing the indices j for which kj = Ki. 

The value of W is now defined unambiguously on each interval and W is a smooth 

decreasing function on this interval. At Km^ W has a jump of size 

1 - ^(i) ^ ^m+l -im- r{{) 

q q 

By assumption, in the current case there is no interval (Ki^ A^z+i) whose image under W contains 
J. Therefore J is contained in one of the jumps occurring between these intervals, say the jump 
at Ki. Let us now choose \ — Ki. Let us also define the left and right limits of W dX Ki'. 
a+ = \Ymx^Ki,x<Ki W{x), a_ = \mix^Ki,x>KiW{x). 

Then a_ < a+ and the current case entails that J G [a_,a+]. Let us also denote the values of 
the jumps by the more compact notation 5 ^ = (1 — With these notations, we can now easily 

state that the possible values of W{Ki) are, for any set S d Si 

W{Ki,S)^a. + Y,st. 

tes 

These consitute at most different values, but some of them may be equal. Nonetheless, 
the values Si are between 0 and 1/g, so for any value x G [a_,a+], we can find a suitable subset S 
such that VL* = W{Ki^S) approximates x within l/{2q). In fact, for any ordering of the st values 

5 ^ 2 ,..., , looking at the two cumulative sums st^ -\- st^ + • • • + st^ scoring nearest to x, we 

can find one closer than l/{2q). This shows that the desired subset can be found in 0(15^1) steps; 
with more work we can possibly find a better packing. 

For this choice of the values Wi^ the sum of Wi equals VF*, and |VF* — J| < l/{2q). To summarize 
this second case: Even if we can’t find a dual variable A such that the constraint is satisfied, we 
can find one such that the sum of Wi equals VF*, and |VF* — J\ < l/{2q). 

In fact we have found the exact optimal weights for a slightly different problem. Define the scaled 
weights SJ = JwijW^^ then 5}^ solves the optimization problem 3 with the parameter q replaced 
by g* = Wq/J. This is clear if we notice that the optimization problem can be parametrized by 
g, and everything we’ve said so far holds for each fixed q. Thus, for any value of VF*, we just need 
to find a value g* such that the constraint holds. This amounts to 

J 

i=l 

and we find the claimed equation for g*. Further |g* — q\ = \q{J — VF*)/J| < 1 /( 2 J), as desired. 
This finishes the proof that q* exists, we just need to show how to find it. 

Final algorithm The final algorithm is sumarized in Algorithm 2. We compute the values ki 
for each i, using Brent’s method on the equation d{k) = 0. This takes 0(J) steps. Next, we sort 
the unique values ki (0(J log J) steps), and do a binary search on the values W{ki) to find the 
right interval for A. Calculating a value VF(A) takes 0(J) steps. The binary search takes 0(log J) 
function evaluations of VF, making the total cost of this step 0(Jlog J). 

Once we found the right interval, we have to deal with the two possibilities identified above. 
If there is an exact solution VF(A) = J, then after finding the right interval, Brent’s method is 
used to solve the equation (0(J) steps). On this interval, the function W is smooth with explicitly 
computable derivatives. If there is no exact solution, then we simply return the closest interval 
endpoint (0(J) steps). The overall cost is 0(J log J). 
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Algorithm 2 Method to compute weights 


1 : 

2 : 

3: 

4: 

5: 

6 : 

7: 

8 : 

9: 

10 : 

11 : 

12 : 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20 : 

21 : 

22 : 


23: 


24: 


77 ^, ^ prior means and variances i = 1,..., J 

q ^ significance threshold 
if condition (5) holds then 

Solve 7 ^, A)) = Jq for A G [l,oc) using Newton’s method 

return Wi = ^{ci{r]i,ji,X))/q, q* = q 


else 

Define d{X-,rj,-y) = A$(-ci(r/, 7 ; A)) - $ (-(ci(r?, 7 ; A) - rj)/^) 
for all i Solve d{ki]r]i^'yi) = 0 for ki G 1] using Brent 

Let the sorted unique values ki be 

Define »-(A) = ( f ^ ^ 

* y 1/q \i X <ki 

Define Sj(A) = /(A = h){'^/q - w~{X)) 

Define wf (A) = (A) + Sj(A) 

Define W+{X) = and W-{X) = 

Find the index j for which W^iKa)) > Jq > via binary search 

if >Jq> VF+(A(^-+i)) then 

Solve VF+(A) = Jq for A G using Brent 

return Wi = w^' (A), = q 


else 


Find the indices S — {ji,... ^jn} such that kj. = 

Find the largest index T such that r~ := + J2i<T (Kq)) <Jq 

Define = r“ + Sj, 


■^JT+I i^U)^ 


Let VF* be the closer of r 

-(Ku)> 
(KW) 
’“i, (7 j)) 


Define wf = 


77 ;. 


wHKii)) 


return Wi = Jw* /VF*, q"^ 


,r ' to Jg (break ties arbitrarily) 

ifi^S 

if i = jk^ for fc > T + 1 
if i = jk^ for k <T 

if i = JT+I 5 depending on the choice of 
= W^q/J. 


10 Data sources 

10.1 CARDIOGRAM - CAD 

This is a meta-analysis of 14 coronary artery disease (CAD) GWAS, comprising 22,233 cases and 
64,762 controls of European descent (Schunkert et ah, 2011). The study includes 2.3 million SNPs. 
In each of the 14 studies and for each SNP, a logistic regression of CAD status was performed on 
the number of copies of one allele, along with suitable controlling covariates. The resulting effect 
sizes were combined across studies using fixed effects or random effects meta-analysis with inverse 
variance weighting. 

10.2 C4D - CAD 

This is a meta-analysis of 5 heart disease GWAS, totalling 15,420 CAD cases and and 15,062 controls 
(Coronary Artery Disease (C4D) Genetics Consortium, 2011). The samples did not overlap those 
from CARDIoGRAM. The analysis steps were similar to CARDIoGRAM. 
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10.3 Chronic Kidney Disease Consortium (CKDGen) - eGFR creatinine 

This is a GWAS of kidney traits in 67,093 participants of European ancestry from 20 population- 
based cohorts (Kottgen et ah, 2010). eGFR creatinine (eGFRcrea) was the trait with the largest 
sample size. There is no reported overlap with the samples from C4D. The analysis steps were 
similar to the previous two studies. 

10.4 Blood Lipids 

This is a GWAS of blood lipids in a sample from European populations (Teslovich et ah, 2010). 
Triglyceride levels (TG) were one of the traits, with sample size 96,598, chosen here out of all lipids 
because of its previous appearance in (Andreassen et ah, 2013). Standard protocols for GWAS 
were used: linear regression analysis with study-specific covariates, combined using fixed-effects 
met a-analysis. 

10.5 Psychiatric Genomics Consortium - Schizophrenia 

This is a mega-analysis (i.e. using the raw data not just summaries) combining GWAS data 
from 17 separate studies of schizophrenia (SCZ), with a total of 9,394 cases and 12,462 controls 
(Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium, 2011). They 
tested for association using logistic regression of SCZ status on the allelic dosages. The overlap 
with the blood lipids study consists of 1,459 controls (12% of controls in the SCZ study), from the 
British 1958 Birth Cohort of the Wellcome Trust Case Control Consortium. 
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